fix(grafanactl): reconcile stale URLs and delete orphaned Grafana datasources#258
fix(grafanactl): reconcile stale URLs and delete orphaned Grafana datasources#258cssjr wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances grafanactl modify datasource reconcile so it not only reconciles Azure Monitor Workspace (AMW) integrations on the Managed Grafana ARM resource, but also detects and fixes stale Managed_Prometheus_* Prometheus datasource URLs in Grafana by aligning them with each workspace’s current PrometheusQueryEndpoint.
Changes:
- Added a Grafana client helper to update an existing datasource via the Grafana API.
- Wired a Grafana API client into the
modify datasource reconcilecommand execution path. - Implemented datasource URL reconciliation logic for
Managed_Prometheus_*Prometheus datasources, honoring--dry-run.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tools/grafanactl/internal/grafana/client.go | Adds UpdateDataSource wrapper to support updating datasources via Grafana API. |
| tools/grafanactl/cmd/modify/options.go | Instantiates and carries a Grafana API client alongside existing ARM clients. |
| tools/grafanactl/cmd/modify/cmd.go | Collects AMW query endpoints and updates stale Managed_Prometheus_* datasource URLs (supports dry-run). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fdc4ac3 to
ba01314
Compare
The modify datasource reconcile command only managed Azure Monitor Workspace integrations (resource IDs) but never checked the actual datasource URLs in Grafana. When an AMW Prometheus query endpoint hostname changes, datasource URLs become stale and dashboards fail with DNS resolution errors. After integration reconciliation, the command now lists Grafana datasources, compares each Managed_Prometheus_* URL against the current AMW PrometheusQueryEndpoint, and updates any that differ. Fixes: ARO-27914 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ba01314 to
dc0e6a9
Compare
…URL updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| workspace.Properties.Metrics.PrometheusQueryEndpoint == nil { | ||
| continue | ||
| } | ||
| if *workspace.Properties.ProvisioningState == armmonitor.ProvisioningStateSucceeded { |
There was a problem hiding this comment.
workspaces can be in an another provision state (something something updating) intermittently
There was a problem hiding this comment.
It now skips workspaces if they are Failed or Cancelled but includes workspaces in any other state (Succeeded, Updating, Creating). If a workspace is Creating but doesn't have an endpoint yet, it eventually gets skipped at the nil reference check.
…asources The modify datasource reconcile command only managed Azure Monitor Workspace integrations (resource IDs) but never checked the actual datasource URLs or removed orphaned datasources in Grafana. After integration reconciliation, the command now: - Updates datasource URLs when the AMW PrometheusQueryEndpoint has changed (fixes DNS resolution errors from stale hostnames) - Deletes orphaned Managed_Prometheus_* datasources whose workspaces no longer exist Both operations respect --dry-run. This consolidates the cleanup previously handled by the separate clean fixup-datasources command into the pipeline-integrated reconcile step. Fixes: ARO-27914 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dpoint presence A workspace that exists but has no PrometheusQueryEndpoint yet would be kept as an integration (causing Grafana to maintain its datasource) but treated as orphaned by datasource reconciliation (causing deletion). This created a delete-recreate loop. Now orphan detection checks whether the workspace exists at all, and only skips URL comparison for workspaces without endpoints yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tale-grafana-datasource-urls Adds orphaned datasource deletion and fixes workspace existence check to avoid delete-recreate loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Workspaces can be in transitional states like Creating or Updating intermittently. Only exclude workspaces in terminal failure states (Failed, Canceled) from integration and orphan detection, so that workspaces being updated are not temporarily removed and recreated. getWorkspaceEndpoints still requires Succeeded since transitional workspaces may not have a valid PrometheusQueryEndpoint yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Include datasource name and ID in UpdateDataSource error messages for easier troubleshooting - Collect all reconciliation errors (both deletes and updates) using errors.Join instead of returning on the first update failure, which would silently drop accumulated delete errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A workspace transitioning through Updating still has its PrometheusQueryEndpoint from the previous Succeeded state. Use isTerminalFailureState instead of requiring Succeeded, and rely on the existing nil guard for workspaces that genuinely lack an endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r exist Orphan detection now includes all workspaces regardless of provisioning state. A workspace in Failed or Canceled state still exists as an Azure resource, so its datasource should be preserved — it may help identify broken clusters through the Grafana UI, and the pipeline will restore everything when the workspace is fixed. Datasources are only deleted when the workspace is truly gone (not returned by the API at all). Provisioning state filtering remains only for integration reconciliation and endpoint URL updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/lgtm |
|
/hold |
Summary
modify datasource reconcilecommand only managed Azure Monitor Workspace integrations (resource IDs on the Grafana ARM resource) but never checked the actual datasource URLs or removed orphaned datasources in Grafana.PrometheusQueryEndpointhostname has changed (fixes DNS resolution errors)Managed_Prometheus_*datasources whose workspaces no longer exist (consolidatesclean fixup-datasourceslogic into the pipeline step)--dry-run.Fixes: ARO-27914, AROSLSRE-1347, AROSLSRE-585
Test plan
cd tools/grafanactl && go build ./...— compiles cleanlycd tools/grafanactl && go vet ./...— no issues🤖 Generated with Claude Code